智能论文笔记

Data Isotopes for Data Provenance in DNNs

Emily Wenger , Xiuyu Li , Ben Y. Zhao , Vitaly Shmatikov

分类：机器学习

2022-08-29

如今，渴望数据的深神经网络（DNNS）的创建者搜索互联网训练饲料，使用户几乎无法控制或了解何时将其数据用于模型培训。为了使用户能够抵消不需要的数据使用，我们设计，实施和评估一个实用系统，该系统使用户能够检测其数据是否用于培训DNN模型。我们展示了用户如何创建我们称为同位素的特殊数据点，该数据点在培训期间将“伪造功能”引入DNN中。仅查询访问训练的模型，并且对模型培训过程不了解或对数据标签的控制，用户可以应用统计假设测试来检测模型是否通过对用户的培训进行培训来了解与其同位素相关的虚假特征数据。这有效地将DNNS对记忆和虚假相关性的脆弱性变成了数据出处的工具。我们的结果证实了在多种设置中的功效，检测并区分了数百种具有高精度的同位素。我们进一步表明，我们的系统在公共ML-AS-AS-Service平台和较大的模型（例如ImageNet）上工作，可以使用物理对象代替数字标记，并且通常对几种自适应对策保持坚固。

translated by 谷歌翻译

Natural Backdoor Datasets

Emily Wenger , Roma Bhattacharjee , Arjun Nitin Bhagoji , Josephine Passananti , Emilio Andere , Haitao Zheng , Ben Y. Zhao

分类：计算机视觉

2022-06-21

有关后门毒物攻击的广泛文献研究了使用“数字触发图案”的后门攻击和防御措施。相比之下，“物理后门”使用物理对象作为触发器，直到最近才被确定，并且在质量上足够不同，可以抵抗针对数字触发后门的所有防御。对物理后门的研究受到了访问大型数据集的限制，该数据集包含包含与分类目标共同位置的物理对象的真实图像。构建这些数据集是时间和劳动力密集的。这项工作旨在应对有关物理后门攻击研究的可访问性挑战。我们假设在流行数据集（例如Imagenet）中可能存在天然存在的物理共同存在的对象。一旦确定，这些数据的仔细重新标记可以将它们转化为训练样本，以进行物理后门攻击。我们提出了一种方法，可以通过在现有数据集中识别这些潜在触发器的这些亚集，以及它们可能毒害的特定类别。我们称这些天然存在的触发级子集自然后门数据集。我们的技术成功地识别了广泛可用的数据集中的自然后门，并在行为上等同于在手动策划数据集中训练的模型。我们发布我们的代码，以使研究社区可以创建自己的数据集，以研究物理后门攻击。

translated by 谷歌翻译

Can Backdoor Attacks Survive Time-Varying Models?

Huiying Li , Arjun Nitin Bhagoji , Ben Y. Zhao , Haitao Zheng

分类：计算机视觉 | 机器学习

2022-06-08

后门是针对深神经网络（DNN）的强大攻击。通过中毒训练数据，攻击者可以将隐藏的规则（后门）注入DNN，该规则仅在包含攻击特异性触发器的输入上激活。尽管现有工作已经研究了各种DNN模型的后门攻击，但它们仅考虑静态模型，这些模型在初始部署后保持不变。在本文中，我们研究了后门攻击对时变DNN模型更现实的情况的影响，其中定期更新模型权重以处理数据分布的漂移。具体而言，我们从经验上量化了后门针对模型更新的“生存能力”，并检查攻击参数，数据漂移行为和模型更新策略如何影响后门生存能力。我们的结果表明，即使攻击者会积极增加触发器的大小和毒药比，即使在几个模型更新中，一次射击后门攻击（即一次仅中毒训练数据）也无法幸免。为了保持模型更新影响，攻击者必须不断将损坏的数据引入培训管道。这些结果共同表明，当模型更新以学习新数据时，它们也将后门“忘记”为隐藏的恶意功能。旧培训数据之间的分配变化越大，后门被遗忘了。利用这些见解，我们应用了智能学习率调度程序，以进一步加速模型更新期间的后门遗忘，这阻止了单发后门在单个模型更新中幸存下来。

translated by 谷歌翻译

SoK: Anti-Facial Recognition Technology

Emily Wenger , Shawn Shan , Haitao Zheng , Ben Y. Zhao

分类：计算机视觉 | 机器学习

2021-12-08

近年来政府和商业实体的面部识别（FR）技术的快速采用提出了对公民自由和隐私的担忧。作为回应，已经开发了一套广泛的所谓“反面部识别”（AFR）工具，以帮助用户避免不需要的面部识别。在过去几年中提出的一组AFR工具是广泛的，快速发展，需要退回措施，以考虑AFR系统的更广泛的设计空间和长期挑战。本文旨在填补该差距，并提供对AFR研究景观的第一次综合分析。使用FR系统的运营级作为起点，我们创建了一个系统框架，用于分析不同AFR方法的益处和权衡。然后，我们考虑到AFR工具面临的技术和社会挑战，并提出在该领域的未来研究方向。

translated by 谷歌翻译

Poison Forensics: Traceback of Data Poisoning Attacks in Neural Networks

Shawn Shan , Arjun Nitin Bhagoji , Haitao Zheng , Ben Y. Zhao

分类：人工智能

2021-10-13

在对抗机器学习中，防止对深度学习系统的攻击的新防御能力在释放更强大的攻击后不久就会破坏。在这种情况下，法医工具可以通过追溯成功的根本原因来为现有防御措施提供宝贵的补充，并为缓解措施提供前进的途径，以防止将来采取类似的攻击。在本文中，我们描述了我们为开发用于深度神经网络毒物攻击的法医追溯工具的努力。我们提出了一种新型的迭代聚类和修剪解决方案，该解决方案修剪了“无辜”训练样本，直到所有剩余的是一组造成攻击的中毒数据。我们的方法群群训练样本基于它们对模型参数的影响，然后使用有效的数据解读方法来修剪无辜簇。我们从经验上证明了系统对三种类型的肮脏标签（后门）毒物攻击和三种类型的清洁标签毒药攻击的功效，这些毒物跨越了计算机视觉和恶意软件分类。我们的系统在所有攻击中都达到了98.4％的精度和96.8％的召回。我们还表明，我们的系统与专门攻击它的四种抗纤维法措施相对强大。

translated by 谷歌翻译

Blacklight: Scalable Defense for Neural Networks against Query-Based Black-Box Attacks

Huiying Li , Shawn Shan , Emily Wenger , Jiayun Zhang , Haitao Zheng , Ben Y. Zhao

分类：计算机视觉 | 机器学习

2020-06-24

已知深度学习系统容易受到对抗例子的影响。特别是，基于查询的黑框攻击不需要深入学习模型的知识，而可以通过提交查询和检查收益来计算网络上的对抗示例。最近的工作在很大程度上提高了这些攻击的效率，证明了它们在当今的ML-AS-A-Service平台上的实用性。我们提出了Blacklight，这是针对基于查询的黑盒对抗攻击的新防御。推动我们设计的基本见解是，为了计算对抗性示例，这些攻击在网络上进行了迭代优化，从而在输入空间中产生了非常相似的图像查询。 Blacklight使用在概率内容指纹上运行的有效相似性引擎来检测高度相似的查询来检测基于查询的黑盒攻击。我们根据各种模型和图像分类任务对八次最先进的攻击进行评估。 Blacklight通常只有几次查询后，都可以识别所有这些。通过拒绝所有检测到的查询，即使攻击者在帐户禁令或查询拒绝之后持续提交查询，Blacklight也可以防止任何攻击完成。 Blacklight在几个强大的对策中也很强大，包括最佳的黑盒攻击，该攻击近似于效率的白色框攻击。最后，我们说明了黑光如何推广到其他域，例如文本分类。

translated by 谷歌翻译

BLOOM: A 176B-Parameter Open-Access Multilingual Language Model

Teven Le Scao , Angela Fan , Christopher Akiki , Ellie Pavlick , Suzana Ilić , Daniel Hesslow , Roman Castagné , Alexandra Sasha Luccioni , François Yvon , Matthias Gallé

分类：自然语言处理

2022-11-09

Large language models (LLMs) have been shown to be able to perform new tasks based on a few demonstrations or natural language instructions. While these capabilities have led to widespread adoption, most LLMs are developed by resource-rich organizations and are frequently kept from the public. As a step towards democratizing this powerful technology, we present BLOOM, a 176B-parameter open-access language model designed and built thanks to a collaboration of hundreds of researchers. BLOOM is a decoder-only Transformer language model that was trained on the ROOTS corpus, a dataset comprising hundreds of sources in 46 natural and 13 programming languages (59 in total). We find that BLOOM achieves competitive performance on a wide variety of benchmarks, with stronger results after undergoing multitask prompted finetuning. To facilitate future research and applications using LLMs, we publicly release our models and code under the Responsible AI License.

translated by 谷歌翻译

Fill in Fabrics: Body-Aware Self-Supervised Inpainting for Image-Based Virtual Try-On

H. Zunair , Y. Gobeil , S. Mercier , A. Ben Hamza

分类：计算机视觉

2022-10-03

Previous virtual try-on methods usually focus on aligning a clothing item with a person, limiting their ability to exploit the complex pose, shape and skin color of the person, as well as the overall structure of the clothing, which is vital to photo-realistic virtual try-on. To address this potential weakness, we propose a fill in fabrics (FIFA) model, a self-supervised conditional generative adversarial network based framework comprised of a Fabricator and a unified virtual try-on pipeline with a Segmenter, Warper and Fuser. The Fabricator aims to reconstruct the clothing image when provided with a masked clothing as input, and learns the overall structure of the clothing by filling in fabrics. A virtual try-on pipeline is then trained by transferring the learned representations from the Fabricator to Warper in an effort to warp and refine the target clothing. We also propose to use a multi-scale structural constraint to enforce global context at multiple scales while warping the target clothing to better fit the pose and shape of the person. Extensive experiments demonstrate that our FIFA model achieves state-of-the-art results on the standard VITON dataset for virtual try-on of clothing items, and is shown to be effective at handling complex poses and retaining the texture and embroidery of the clothing.

translated by 谷歌翻译

Ontologizing Health Systems Data at Scale: Making Translational Discovery a Reality

Tiffany J. Callahan , Adrianne L. Stefanski , Jordan M. Wyrwa , Chenjie Zeng , Anna Ostropolets , Juan M. Banda , William A. Baumgartner Jr. , Richard D. Boyce , Elena Casiraghi , Ben D. Coleman

分类：人工智能

2022-09-10

通用数据模型解决了标准化电子健康记录（EHR）数据的许多挑战，但无法将其集成深度表型所需的资源。开放的生物学和生物医学本体论（OBO）铸造本体论提供了可用于生物学知识的语义计算表示，并能够整合多种生物医学数据。但是，将EHR数据映射到OBO Foundry本体论需要大量的手动策展和域专业知识。我们介绍了一个框架，用于将观察性医学成果合作伙伴关系（OMOP）标准词汇介绍给OBO铸造本体。使用此框架，我们制作了92,367条条件，8,615种药物成分和10,673个测量结果的映射。域专家验证了映射准确性，并且在24家医院进行检查时，映射覆盖了99％的条件和药物成分和68％的测量结果。最后，我们证明OMOP2OBO映射可以帮助系统地识别可能受益于基因检测的未诊断罕见病患者。

translated by 谷歌翻译

BALF: Simple and Efficient Blur Aware Local Feature Detector

Zhenjun Zhao , Yu Zhai , Ben M. Chen , Peidong Liu

分类：计算机视觉

2022-11-27

Local feature detection is a key ingredient of many image processing and computer vision applications, such as visual odometry and localization. Most existing algorithms focus on feature detection from a sharp image. They would thus have degraded performance once the image is blurred, which could happen easily under low-lighting conditions. To address this issue, we propose a simple yet both efficient and effective keypoint detection method that is able to accurately localize the salient keypoints in a blurred image. Our method takes advantages of a novel multi-layer perceptron (MLP) based architecture that significantly improve the detection repeatability for a blurred image. The network is also light-weight and able to run in real-time, which enables its deployment for time-constrained applications. Extensive experimental results demonstrate that our detector is able to improve the detection repeatability with blurred images, while keeping comparable performance as existing state-of-the-art detectors for sharp images.

translated by 谷歌翻译